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Abstract. Sampling formulas describe probability laws of exchangeable combinatorial 
structures like partitions and compositions. We give a brief account of two known parametric 
families of sampling formulas and add a new family to the list. 

1 Introduction. By an integer composition of weight n and length I we shall mean an 
ordered collection of positive integer parts A = (Aj., . . . , A^); we write A h n for Aj = n. It 
will be convenient to also use variables A& = A& + . . . + A^, k < £, so that Xj = Aj — Aj-%. 

A composition structure is a nonnegative function q on compositions such that for each 
n the values {q(X) : A h n} comprise a probability distribution, say q n , and the q n 's satisfy 
the following sampling consistency condition. Imagine an ordered series of randomly many 
nonempty boxes filled in randomly with balls, so that the distribution of occupancy numbers 
from the left to the right is q n . The condition requires that if some k < n balls are sampled 
out uniformly at random then the distribution of the reduced occupancy numbers in nonempty 
boxes (in same order) must be exactly q n -k (without loss of generality we can take k — 1). 

Ignoring the order of boxes yields Kingman's partition structure jTj (see |l| and [[K]] for sys- 
tematic development of the theory of partition structures, their relation to exchangeability and 
many references). But the relation cannot be uniquely inverted, because for a given partition 
structure there are many ways to introduce the order in a consistent fashion. 

Gnedin [|j] showed that all composition structures can be uniquely represented by a version 
of the Kingman's paintbox construction 0. Let U be a paintbox - a random open subset of 
[0,1]. With a paintbox we associate an ordered partition of [0,1] comprised of the intervals 
of U and of individual elements of the complement U c , with the order of blocks induced by 
the order on reals. Suppose n independent uniform random points are sampled from [0, 1] 
independently of U. The sample points group somehow within the partition blocks and we 
obtain a random composition by writing the nonzero occupation numbers from the left to the 
right. With probability one there is no tie among the sample points and the consistency for 
various sample sizes follows from exchangeability in the sample. 

From a topological viewpoint, the representation establishes a homeomorphism between the 
space of extreme composition structures and the compact space of open subsets of [0, 1] (en- 
dowed with a weak topology) , and also identifies the generic composition structure with a unique 
mixture of extremes. Thus already the set of extremes is intrinsically infinite-dimensional, not 
to say about the mixtures. It is therefore a question of interest to find smaller parametric 
families which admit a reasonably simple description. 

In this note we discuss briefly three such families: one is an ordered modification, due to 
Donnelly and Joyce H, of the ubiquitous Ewens sampling formula (corresponding to (0,0)- 
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partition structure from the Ewens-Pitman two-parametric family |J); another one, due to 
Pitman ||, is an ordered (symmetric) version of the (a, 0)-partition structure, and the third 
composition structure is new. Despite the fact that the new composition structure is, in a sense, 
constructed from the beta distributions, like the first two, the corresponding partition structure 
does not fit in the Ewens-Pitman family. All three belong to a large infinite-dimensional family 
of regenerative compositions introduced and characterised in || and all three are mixed, i.e. 
generated by genuinely random paintboxes. 

2 Ordered ESF. In their encyclopaedical exposition of the multivariate Ewens distribution 
Ewens and Tavare presented the ordered version of ESF (see J|, Eqn. 41.6), 

0*n' e 1 

in connection with a size-biased permutation of the Ewens partition structure (here and forth, 
] is the Pochhammer factorial). The special case 9 = 1 is well known to combinatorialists 
as the distribution of cycle lengths in a uniform random permutation, provided the cycles are 
ordered by increasing of their least elements. 

Donnelly and Joyce [Q] observed that the formula also defines a composition structure, i.e. 
that (|l|) determines a consistent sequence of random partitions taken together with an intrinsic 
ordering of classes, based neither on the sizes of classes nor on labeling of 'balls in boxes'. They 
argued that the ordered structure is of some significance for biological applications, and proved 
the following paintbox representation of e. 

Let Zj be independent random variables with beta density 

du = 6 (1 - z)°- l dz . 
Let U e be the open set complementary to the stick-breaking sequence 

i-na-^) k = i,2,... 

3=1 

taken together with the endpoints of [0, 1]. Then rephrasing Theorem 10 from we have 
Theorem 1 The composition structure e can be derived from the paintbox U e . 



The proof of this result given in relied on the twin fact about weak convergence of the 
size-biased permutation of ESF. Next is a direct argument which offers some more insight and 
exemplifies the approach taken in this paper. 

Proof. Introduce the binomial moments 

(2) w(n:m) = (] C z m (l - z) n ~ m duj(z) 

\mj Jo 

= 9 f U I B(m + l,n - m + 6) 
\m 
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For / the leftmost interval of U (adjacent to 0) the size of / equals Z\. Denoting e the 
composition structure derived from U e we aim to show that e = e. 

The argument is based on two facts. Firstly, suppose n uniform points have been sampled 
from [0, 1] and / occured to contain m sample points, then conditionally given I the configu- 
ration of other n — m points is as if it were a uniform sample from I c . The second fact is that 
given I the set U e \ I is a scaled distributional copy of U e , as it is clear from the definition of 
the paintbox via stick-breaking. 

Composition (Ai, . . . , A*) can only appear if the interval J, defined to be the leftmost of the 
intervals of U e discovered by the sample, contains exactly Ai sample points. The chance that 
J coincides with I is w(n : m) and in this case the composition derived from the piece of U e 
to the right from J must be (A2, . . . , A^). Otherwise A can appear only if I contains no sample 
points and all n group within U e \ J in accord with A. 

Combining these facts we get equation 

e(Ai, . . . , X £ ) = w(n : Ai) e(A 2 , . . . , A*) + w(n : 0) e(Ai, . . . , Xi) 
leading to the recursion 

e(Ai, . . . , At) = — e(A 2 ...,X t ). 

1 — w{n : 0) 

which is solved as 

(3) e(A) = fl Q(Aj ■ Xj) 

j'=i 

where 

w(n:m) 

(4) q(n : m) :^ 



1 — w(n : 0) 
9 n\ [0\ n . 



n (n — m)! [9] n 
Cancelling common factors we arrive at (H]), thus e = e. □ 



There is a canonical correspondence between composition structures and probability dis- 
tributions of exchangeable compositions of an infinite set {1,2, . . .} (see 0). In terms of the 
paintbox representation the composition derived from U is obtained by sampling infinitely 
many uniform points and then assigning objects i and j to distinct classes if the closed interval 
spanned on the zth and the jth sample points has a nonempty intersection with U c . 

The infinite composition associated with e, call it £, has a simply ordered collection of 
blocks, and the law of large numbers says that the asymptotic frequencies of the blocks (in a 
growing sample) coincide with the sizes of stick-breaking residuals, from the left to the right. 
When we view £ from the perspective of restrictions £ n on finite sets {!,... ,n}'s, the collection 
of blocks stabilises (with probability one) in the sense that for any k no new block appearing 
in £ n i , for n' > n , will interlace with the collection of the first k blocks represented in £ n , 
provided n is sufficiently large (a zero-one law). Compositions with this property were called 
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'representable' in and the class of such compositions generated by a general stick-breaking 
paintbox was characterised in ||. 



3 PSF. Pitman's composition structure is given by Eqn. (30) in ||: 

( 5) p(A) = ^!nIl^k „«,<!. 

[oc] n f-Ji Xj\ 

This sampling formula was derived from the following paintbox representation. 

Theorem 2 The paintbox U p for p is the union of excursion intervals of the Bessel bridge 
of dimension 2 — 2a. 



Equivalently, the complement U£ is the set of zeroes of the Bessel bridge on [0, 1]. The case 
a = 1/2 corresponds to the Brownian bridge. 

In fact, p is a conditional version of another Pitman's composition structure p' derived from 
the set of zeroes of a Bessel process (which has final meander interval adjacent to the rightpoint 
of [0, 1]). Pitman obtained a formula for p' akin to (|5|) (see ||, Eqn. (28)) using selfsimilarity 
of the Bessel process and distribution of the length of meander interval. The relation between 
the structures is that 

p(\) = const(n) p'(\, 1) Ahn. 

Gnedin and Pitman || give a characterisation of p related to the observation that this 
composition structure is also of the product form (similar to (|])) with 

( a )(" 

q[n : m) = — 



—a 
n 



For I fixed, p is a symmetric function of the parts. This reflects in that U p is symmetric, that 
is has component intervals 'in random order' (in [|I| the open sets with this kind of invariance 
are called 'exchangeable interval partitions'). Summing p( A) over distinct permutations of parts 
yields a function on integer partitions which is the (a, 0)-partition structure from the Ewens- 
Pitman family. It follows that p could be obtained from the partition structure by permuting 
the parts in uniform random order (this is the general device allowing to derive symmetric 
composition structures and symmetric open sets from their unordered relatives |J). 

Blocks of the Pitman's composition V on {1,2, ... ,} are ordered like the set of rational 
numbers and a such cannot be labeled by integers consistently with their intrinsic order. This 
happens each time a composition has infinitely many blocks (almost surely) and is symmetric. 
A consequence is that the infinite composition V has no definite first, second, etc or the last 
block, in particular the first (hence kth) block in V n does not stabilise as n grows. 

4 A new sampling formula. Here is a new composition structure 

n' i 1 

(6) g{x)= wR^) e>0 
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where 

n j 

are the generalised harmonic numbers which coincide with the partial sums of the harmonic 
series when 9=1. 

To explain the genesis of the formula consider stick-breaking with the general beta density 
(7) du(z) = const • z a-1 (l - z) 6 ' 1 a,9>0. 

The resulting paintbox generates a composition structure given by the RHS of (0) with 

(n\ [a] m [0] n _ m 



q{n : m) 



K mJ [a + 9} n - [9] n 



where (J^) is obtained like (|J) from the binomial moments of the beta density (0) (to see this 
just follow the lines in the proof of Theorem 1). 

For general a and (3 the induced composition structure cannot be expressed by a simple 
product formula, because the denominator has no good factorisation. One notable exception is 
the ESF appearing when a = 1. Another exception is the case a = giving rise to g; but this 
should be interpreted properly because measure uj becomes infinite. 



Theorem 3 When a j the stick-breaking composition structure directed by the beta den- 
sity (0) converges to g. 

Proof. Expansions in powers of a start with 

[a] m = a(m-l)\ + ... , [a + 9] n - [8} n = ah e {n) + ... 

therefore when a approaches we get 

which yields g as in (||). □ 

Distribution (|]) underlying g is especially simple for 9 = 1 when it gives a weight propor- 
tional to m _1 to each m = 1, . . . ,n. 

To determine the paintbox representation for g we will extend the classical stick-breaking 
procedure by embedding the process into continuous time and allowing infinitely many breaks 
within any time interval. Note that defining a composition structure via the RHS of (||), through 
the binomial moments of some measure uj and 

win : m) 

q[n : m) - 



win : 1) + . . . + w[n : n) 
we need not require that the measure u be finite and do need to only impose the condition 



/ zdoj(z) < oo 
Jo 



to have all binomial moments finite for 1 < m < n < oo. 

In particular, our g appears when we take improper density 

(10) du(z) = z-\l- zf^dz 

(see H for more examples). For this u consider a planar Poisson process (PPP) in the infinite 
strip [0, oo] x [0, 1] with Lebesgue x u as intensity measure. The PPP has countably many 
atoms (Tj,£j) (we adopt the conventional fake labeling of atoms which is not intended to say 
that Tj or £j is a definite random variable for particular j), and each location on the abscissa is 
a concentration point for the set of atoms. Define a pure-jump process with increasing cadlag 
paths 

s t = i- n 

{Tj,ij):Tj<t 

where the product is over all PPP atoms to the left from t. For any t the product converges 
because zuj{dz) is a finite measure. The process (S t ) is a geometric subordinator: for t' > t the 
ratio (1 — S t ')/(1 — S t ) is independent of the partial path on [0, t] and has same distribution as 
1 S r t . 

(The reader feeling more comfort with breaking sticks from the right to the left should 
translate paintbox formulas using involution z <-> 1 — z and also mirror the sampling formulas.) 

Theorem 4 The "paintbox U g representing g is the complement to the closure of the random 
set {S t : t > 0}, which is the range of the geometric subordinator. 

Proof. Fix A h n and consider a uniform sample of size n. The composition A appears when 
for some Tj the interval [0, S T .] contains m sample points grouped in one component interval of 
U g fl [0, S Tj ] and the composition on the remaining (n — m) sample points is (A2, . . . , Xe). From 
the properties of uniform distribution and because PPP is ruled by a product measure follows 
that the composition structure induced by U g is of the product form as in (||) and we only need 
to justify the formula (|§P for q which is the distribution of the first part of composition of n. 

To that end, let ir(t) be the probability that some m sample points group in one interval of 
U g PI [0, St] and denote ei, . . . , e n the increasing order statistics of uniform sample. Considering 
a small time interval [0, dt] it is not hard to see that tt satisfies the differential equation 

tt' = -an + b, 7r(0) = 

with constant coefficients 

a = Eu)[ei, 1] = w{n : 1) + . . . + w(n : n) and b = Eu[e m , e m+ i] = w(n : m) 

(with 1 in place of e m+ i in case m = n) where w(n : m)'s are the binomial moments of flTD|). 
Solving the equation we obtain <f>(t) = (b/a)(l — e~ at ) — > b/a — q(n : m), as t — > 00 whence 
q(n : m) — b/a and this is @. □ 

The infinite composition Q associated with g has infinitely many blocks, and the set of blocks 
is order isomorhic to the set of rational numbers. Unlike Pitman's V it is not symmetric, i.e. g 
is sensible to permutation of parts Xj when i > 1, and the representing paintbox U g is not an 
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'exchangeable interval partition'. A combinatorialist might find natural to view g as a function 
on Young diagrams (Ai, . . . , A^) with strictly decreasing parts. 

Ignoring the order in Q yields a novel partition structure. For no 9 belongs this partition 
structure to the Ewens-Pitman two-parameter family, which had covered practically all explicit 
sampling formulas known to date. The distinction can be seen immediately by comparing the 
probability of one-class partition, our g(n) = q(n : n) given by (H) versus the analogous quantity 
computed via Eqn. (16) in || (the formulas do not match for n > 4 whatever the values of 
parameters) . 

Taking other integer values of a in ([?]) leads to formulas involving products of stereotypic polyno- 
mial factors, e.g. for a = 2 we have 

92iX) - Wn +2(9+1" 

The resulting infinite compositions have simply ordered blocks and thus are more in line with E. 
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